The Convergence Rate of Majority Vote under Exchangeability

نویسنده

  • Miles E. Lopes
چکیده

When random forests are used for binary classification, an ensemble of t = 1, 2, . . . randomized classifiers is generated, and the predictions of the classifiers are aggregated by majority vote. Due to the randomness in the algorithm, there is a natural tradeoff between statistical performance and computational cost. On one hand, as t increases, the (random) prediction error of the ensemble tends to decrease and stabilize. On the other hand, larger ensembles require greater computational cost for training and making new predictions. The present work offers a new approach for quantifying this tradeoff: Given a fixed training set D, let the random variables Errt,0 and Errt,1 denote the class-wise prediction error rates of a randomly generated ensemble of size t. As t → ∞, we provide a general bound on the “algorithmic variance”, var(Errt,l|D) ≤ fl(1/2) 2 4t + o( 1 t ), where l ∈ {0, 1}, and fl is a density function that arises from the ensemble method. Conceptually, this result is somewhat surprising, because var(Errt,l|D) describes how Errt,l varies over repeated runs of the algorithm, and yet, the formula leads to a method for bounding var(Errt,l|D) with a single ensemble. The bound is also sharp in the sense that it is attained by an explicit family of randomized classifiers. With regard to the task of estimating fl(1/2), the presence of the ensemble leads to a unique twist on the classical setup of non-parametric density estimation — wherein the effects of sample size and computational cost are intertwined. In particular, we propose an estimator for fl(1/2), and derive an upper bound on its MSE that matches “standard optimal non-parametric rates” when t is sufficiently large.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

signSGD: compressed optimisation for non-convex problems

Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. SIGNSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. SIGNSGD can exploit mismatches ...

متن کامل

Testing Exchangeability On-Line

The majority of theoretical work in machine learning is done under the assumption of exchangeability: essentially, it is assumed that the examples are generated from the same probability distribution independently. This paper is concerned with the problem of testing the exchangeability assumption in the on-line mode: examples are observed one by one and the goal is to monitor on-line the streng...

متن کامل

Complete convergence of moving-average processes under negative dependence sub-Gaussian assumptions

The complete convergence is investigated for moving-average processes of doubly infinite sequence of negative dependence sub-gaussian random variables with zero means, finite variances and absolutely summable coefficients. As a corollary, the rate of complete convergence is obtained under some suitable conditions on the coefficients.

متن کامل

A Unified Analysis of Rational Voting with Private Values and Cost Uncertainty∗

We provide a unified analysis of the canonical rational voting model with privately known political preferences and costs of voting. Focusing on type-symmetric equilibrium, we show that for small electorates, members of the minority group vote with a strictly higher probability than do those in the majority, but the majority is strictly more likely to win the election. As the electorate size gr...

متن کامل

Chaoticity for multi-class systems and exchangeability within classes

We define a natural partial exchangeability assumption for multi-class systems with Polish state spaces, under which we obtain results extending those for exchangeable systems: the conditional law of a finite system given the vector of the empirical measures of its classes corresponds to independent uniform permutations within classes, and the convergence in law of this vector is equivalent to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1303.0727  شماره 

صفحات  -

تاریخ انتشار 2013